Data set from 2018.

Plot 1: Developer Gender vs Developer Type.

This graph shows how many women and men are in a given field of IT. It is easily seen that more men works in IT and that most of the women works in back and end developer field of IT and programming. The majority of IT specialists work as full stack developer, front end developer and also at back end developer. To plot this graph we selected columns that consists of developer types and gender to calculate how many of them work in given field.

Plot 2: Stack overflow usage in different age group.

This graph shows how many people in different state of their IT career use Stack overflow. It is easily seen that student, who are young and have low experience use Stack overflow all the time and they also have Stack overflow account. More experienced people and users don’t depend on the Stack overflow as much as their younger coworkers. They also don’t have an account. To plot the graph we selected columns that corresponds frequency of visits, having an account and years of professional coding. We used ggplot library with geoem_jitter() method.

Plot 3: Countries and operating system they use.

This graph shows the most popular and the most used operating system in given country. We can choose between Linux, Windows, MacOS and systems which based on Linux. The most popular operating system in Europe and America is Windows. In Africa in few countries IT specialists and students use MacOS (Mali, Niger, Namibia, Botswana) and Linux (Mozambique, Kenya, Senegal, Guinea) in major. In Asia in Japan and Thailand dominates MacOS. To show this graph we used geopandas library and matplotlib.

Plot 4: Mean satisfaction and usage of programming languages

This graph presents the mean Satisfaction and Usage of Programming Languages ordered by Paradigm. Table which consists of list of languages is ordered by paradigm. The graph on the left shows the mean satisfaction scores of programming language paradigms, where each segment represents a paradigm and the distance from the center indicates the average satisfaction level. The graph on the right displays the count of programming languages within each paradigm. Both graphs provide insights into the relationship between programming language paradigms and job satisfaction, as well as the distribution of languages across different paradigms.

List of Languages ordered by Paradigm

Summary.

  1. Cleaning code.

In a separate Python file, prior to generating graphs, we meticulously pre-processed various datasets. These datasets comprised the developer survey data from 2018, including files delineating gender and developer types, salary alongside years of coding, and operating systems categorized by country.

  1. Information about data sets.
  • gender and developer types - from 2018, used in first plot

Initially given person has many values in one column called developer type. We divided this types separately and we created a new matrix with gender and developer type with ‘0’ and ‘1’ values.

X Gender System.administrator Student Game.or.graphics.developer Designer Embedded.applications.or.devices.developer Marketing.or.sales.professional Back.end.developer Data.scientist.or.machine.learning.specialist Data.or.business.analyst Full.stack.developer DevOps.specialist C.suite.executive..CEO..CTO..etc.. Engineering.manager Mobile.developer Product.manager Database.administrator QA.or.test.developer Front.end.developer Educator.or.academic.researcher Desktop.or.enterprise.applications.developer
0 Male 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
1 Male 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0
3 Male 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
4 Male 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1
5 Male 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 0
  • stack and years coding file - from 2018, used in second plot

Here we also created a new dataset, which consists of information about having an account, frequency of using a Stack overflow and experience (in years) of a given person.

  • operating system and countries - from 2018, used in third plot

We created a new dataset which consists of name of a given country and name of the operating system they use in major. Some countries names aren’t the same as in the geopandas library, that’s why we modified some of them.

X Country OperatingSystem
0 Kenya Linux-based
1 United Kingdom Linux-based
3 United States of America Windows
4 South Africa Windows
5 United Kingdom Linux-based
  • languages and job satisfaction - from 2018, used in fourth plot

We created a function which takes a DataFrame containing information about programming languages worked with and job satisfaction, expands the DataFrame to include one column for each programming language, and fills these columns based on the languages worked with by each respondent.

X JobSatisfaction TypeScript Objective.C Cobol Julia Rust Groovy F. VBA C. Perl Matlab VB.NET C.. Kotlin JavaScript Lua Scala Hack C Assembly Visual.Basic.6 PHP Swift CoffeeScript Bash.Shell Ocaml Java Go CSS SQL R Python Haskell Erlang HTML Delphi.Object.Pascal Ruby Clojure
0 Extremely satisfied 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0
1 Moderately dissatisfied 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
3 Neither satisfied nor dissatisfied 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0
4 Slightly satisfied 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0
5 Moderately satisfied 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0

Dataset from 2023.

Plot 5: Distribution of grouped developer types across organization sizes.

This graph shows the distribution of grouped developers types in different companies based on their sizes. We check how many data scientist, developers, managements are in given company. In each company it is seen that developer field is the most common one and has the majority of the employees. This graph, similar as fourth one is interactive.

Plot 6: Number of people and their experience in years.

This graph shows the correlation between professional coding and non professional coding in years. It also shows the salary of a given group of people. Majority of people in the data set work in IT field from 2 to 15 years and their salary is average paid. People who work about 20-30 years earn much more money - about 100000 USD per year. In the graph there is a sample of 100 surveyed, because of that graph is more readable and transparent. It is also easy to analize that people start coding 5 years before their career - it is probably because fo their studies or courses. Data set is from 2023.

Summary.

  1. Cleaning code.

Similarly, we processed data from the 2023 developer survey, encompassing files detailing time and search responses, knowledge distribution and frequency, age juxtaposed with years of experience, developer types in relation to organizational size, and salary aligned with years of coding.

  1. Information about data sets.
  • developer type and organisation size - from 2023, used in fifth graph

We created a function which groups developer types into broader categories, sorts organization sizes, and saves the resulting DataFrame to a CSV file for further analysis.

X DevType OrgSize Grouped_DevType
50737 Developer, full-stack Just me - I am a freelancer, sole proprietor, etc. Developer
22682 Developer, back-end Just me - I am a freelancer, sole proprietor, etc. Developer
79944 Developer, desktop or enterprise applications Just me - I am a freelancer, sole proprietor, etc. Developer
79941 Developer, front-end Just me - I am a freelancer, sole proprietor, etc. Developer
47581 Developer, full-stack Just me - I am a freelancer, sole proprietor, etc. Developer
  • salary and years coding - from 2023, used in sixth graph

We created a new dataset which consists of years coding professionally and non professionally. The code we wrote selects certain columns related to salary and coding experience, drops any rows with missing values, and renames the columns for clarity. We also standardized the salary to USD.

X Salary SalaryUSD YearsCode YearsCodePro SalaryUSD.1
1 285000 285000 18 9 285000
2 250000 250000 27 23 250000
3 156000 156000 12 7 156000
4 1320000 23456 6 4 23456
5 78000 96828 21 21 96828